Auditory Teager energy cepstrum coefficients for robust speech recognition
نویسندگان
چکیده
In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the “true” energy of the signal’s source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.
منابع مشابه
Robustness of Auditory Teager Energy Cepstrum Coefficients for Classification of Pathological and Normal Voices in Noisy Environments
This paper focuses on a robust feature extraction algorithm for automatic classification of pathological and normal voices in noisy environments. The proposed algorithm is based on human auditory processing and the nonlinear Teager-Kaiser energy operator. The robust features which labeled Teager Energy Cepstrum Coefficients (TECCs) are computed in three steps. Firstly, each speech signal frame ...
متن کاملInvestigating the Robustness of Teager Energy Cepstrum Coefficients for Emotion Recognition in Noisy Conditions
This paper investigated the robustness of Teager Energy Cepstrum Coefficient (TECC) in differentiating emotion categories for speech at different White Gaussian noise levels by comparing the performance with MFCC. Experiments involved the normalized squared error measurement, the multi-classes (four classes) emotion classification and the pair-wise emotion classification. This study included fo...
متن کاملMultiband, multisensor robust features for noisy speech recognition
This paper presents a novel feature extraction scheme taking advantage of both the nonlinear modulation speech model and the spatial diversity of speech and noise signals in a multisensor environment. Herein, we propose applying robust features to speech signals captured by a multisensor array minimizing a noise energy criterion over multiple frequency bands. We show that we can achieve improve...
متن کاملImproving the noise-robustness of mel-frequency cepstral coefficients for speech processing
In this paper we study the noise-robustness of mel-frequency cepstral coefficients (MFCCs) and explore ways to improve their performance in noisy conditions. Improvements based on a more accurate model of the early auditory system are suggested to make the MFCC features more robust to noise while preserving their class discrimination ability. Speech versus non-speech classification and speech r...
متن کاملNoise Suppression Based on Teager Energy Operator for Improving the Robustness of Asr Front-end
In this paper, we proposed a new noise suppression method based on Teager Energy Operator in advancing the noise robustness of speech recognition front-end. The presented method attempts to remove a distortion estimation in Teager energy domain, especially, a Teager energy estimation of noise signal is subtracted from the noisy speech signal. This approach differs significantly from the traditi...
متن کامل